feat(visual): reserve ViT worst-case activation memory by sufubao · Pull Request #1378 · ModelTC/LightLLM

sufubao · 2026-07-02T06:49:21Z

Summary

reserve and hold peak ViT activation memory during visual worker startup so co-located LLM KV-pool sizing sees the reduced available GPU memory
add worst-case builders for InternVL/Qwen VL visual towers plus a manual --visual_reserved_mem_gb override
publish visual reservations through shared memory and include the value in max-length diagnostics

Tests

python -m py_compile lightllm/common/basemodel/basemodel.py lightllm/models/qwen2_5_vl/qwen2_5_visual.py lightllm/models/qwen2_vl/qwen2_visual.py lightllm/models/qwen3_vl/qwen3_visual.py lightllm/models/vit/model.py lightllm/server/api_cli.py lightllm/server/visualserver/model_infer/__init__.py lightllm/server/visualserver/model_infer/mem_reserve.py lightllm/server/visualserver/model_infer/model_rpc.py lightllm/server/visualserver/model_infer/worst_case_reserve.py
pure-function check for compute_qwen_worst_case_grid upper-bound rounding
CLI parse check for --visual_reserved_mem_gb

gemini-code-assist

Code Review

This pull request introduces a mechanism to estimate and reserve worst-case activation memory for co-located Vision Transformers (ViT) to prevent out-of-memory (OOM) errors during runtime. It implements shared memory-based communication between the visual worker and the LLM router to report reserved memory, integrates worst-case memory reservation mixins for Qwen-VL and InternVL models, and provides a manual override command-line argument (--visual_reserved_mem_gb). Feedback on the changes highlights a potential race condition during concurrent startup where the LLM worker might attempt to read the shared memory before the visual worker has initialized it, which could cause a startup crash. It is recommended to wrap this lookup in a try-except block to ensure robustness.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-02T06:52:07Z

+    # assumes global_rank == index into visual_gpu_ids (matching how visual ranks call publish_vit_reserved_mem)
+    for global_rank, dev in enumerate(gpu_ids):
+        if dev == device_id:
+            total += int(SharedInt(get_vit_reserved_shm_name(dev, global_rank)).get_value())


During concurrent startup of the LLM worker and the visual worker, the LLM worker may call read_vit_reserved_mem_for_device before the visual worker has initialized and published its reserved memory via publish_vit_reserved_mem. In this case, SharedInt will raise an exception (such as FileNotFoundError or ValueError) because the shared memory segment does not exist yet, causing the LLM worker to crash during startup. Wrapping the lookup in a try...except block ensures robust defensive programming and prevents startup crashes.

Suggested change

total += int(SharedInt(get_vit_reserved_shm_name(dev, global_rank)).get_value())

try:

total += int(SharedInt(get_vit_reserved_shm_name(dev, global_rank)).get_value())

except Exception:

pass

Reserve peak ViT activation memory during visual worker startup so the co-located LLM router sizes its KV pool after the visual tower has already reached its worst-case allocator high-water mark. Add a manual --visual_reserved_mem_gb override for unsupported visual models and include the reserved amount in max-length diagnostics.

gemini-code-assist Bot reviewed Jul 2, 2026

View reviewed changes

sufubao force-pushed the vit-worst-case-mem-reserve branch from 870e44a to a3c5035 Compare July 2, 2026 06:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(visual): reserve ViT worst-case activation memory#1378

feat(visual): reserve ViT worst-case activation memory#1378
sufubao wants to merge 1 commit into
ModelTC:mainfrom
sufubao:vit-worst-case-mem-reserve

sufubao commented Jul 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sufubao commented Jul 2, 2026

Summary

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant